Code
x <- 42
class(x)[1] "numeric"
This is a reference book for Basic Statistics and Projects in R course of the Public Health Sciences Course Program at the University of Bern.
In R, variables can possess diverse types. For instance, it is crucial to differentiate between numbers and character strings, as well as tables and plain lists of numbers. The class function aids us in identifying the specific object type we are dealing with.
x <- 42
class(x)[1] "numeric"
y <- "apple"
class(y)[1] "character"
In R, the prevalent approach for storing data frames involves utilizing a data frame. In terms of conceptualization, a data frame can be envisioned as a tabular structure where rows represent observations and columns define the various variables reported for each observation. Data frames prove highly advantageous for data frames due to their ability to combine different data types into a single object. To load the palmerpenguins1 data set into a data frame, we can use the data function in R. The data set that contains measurements of penguins from three different species.
data(penguins)We can now check the class of the object penguins. The tbl_df class is a subclass of data.frame, created in order to have different default behavior. The colloquial term “tibble” refers to a data frame that has the tbl_df class. Tibble is the central data structure for the set of packages known as the tidyverse.
class(penguins)[1] "tbl_df" "tbl" "data.frame"
Now we can inspect the first 10 rows with head
head(penguins, 10)# A tibble: 10 × 8
species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
<fct> <fct> <dbl> <dbl> <int> <int>
1 Adelie Torgersen 39.1 18.7 181 3750
2 Adelie Torgersen 39.5 17.4 186 3800
3 Adelie Torgersen 40.3 18 195 3250
4 Adelie Torgersen NA NA NA NA
5 Adelie Torgersen 36.7 19.3 193 3450
6 Adelie Torgersen 39.3 20.6 190 3650
7 Adelie Torgersen 38.9 17.8 181 3625
8 Adelie Torgersen 39.2 19.6 195 4675
9 Adelie Torgersen 34.1 18.1 193 3475
10 Adelie Torgersen 42 20.2 190 4250
# ℹ 2 more variables: sex <fct>, year <int>
The str function in R provides a concise and informative display of the internal structure of an R object.
str(penguins)tibble [344 × 8] (S3: tbl_df/tbl/data.frame)
$ species : Factor w/ 3 levels "Adelie","Chinstrap",..: 1 1 1 1 1 1 1 1 1 1 ...
$ island : Factor w/ 3 levels "Biscoe","Dream",..: 3 3 3 3 3 3 3 3 3 3 ...
$ bill_length_mm : num [1:344] 39.1 39.5 40.3 NA 36.7 39.3 38.9 39.2 34.1 42 ...
$ bill_depth_mm : num [1:344] 18.7 17.4 18 NA 19.3 20.6 17.8 19.6 18.1 20.2 ...
$ flipper_length_mm: int [1:344] 181 186 195 NA 193 190 181 195 193 190 ...
$ body_mass_g : int [1:344] 3750 3800 3250 NA 3450 3650 3625 4675 3475 4250 ...
$ sex : Factor w/ 2 levels "female","male": 2 1 1 NA 1 2 1 2 NA NA ...
$ year : int [1:344] 2007 2007 2007 2007 2007 2007 2007 2007 2007 2007 ...
To calculate the mean of a variable in R, you can employ mean
x = 1:10
mean(x)[1] 5.5
We can calculate the mean of the variable “Bill length” in the penguin data set, by combining the mean and summarise. Because we have some NA values specify the na.rm argument of mean.
penguins %>%
summarise(Mean_Bill_Length = mean(bill_length_mm, na.rm = TRUE)) # A tibble: 1 × 1
Mean_Bill_Length
<dbl>
1 43.9
There are three different species in the penguins data set. To calculate the mean for each species we use group_by. Additionally we can use kable to format the output table.
penguins %>%
group_by(species) %>%
summarise(Mean_Bill_Length = mean(bill_length_mm, na.rm = TRUE)) %>%
kable(digits = 2)| species | Mean_Bill_Length |
|---|---|
| Adelie | 38.79 |
| Chinstrap | 48.83 |
| Gentoo | 47.50 |
To calculate the median we can replicate the code from above and insert the function median.
\(\sigma^2 = \frac{1}{N} \sum_{i=1}^{N} (x_i - \mu)^2\)
var computes the variance of x and y if these are vectors.
| species | Mean_Bill_Length | Median_Bill_Length | Variance_Bill_Length |
|---|---|---|---|
| Adelie | 38.79 | 38.80 | 7.09 |
| Chinstrap | 48.83 | 49.55 | 11.15 |
| Gentoo | 47.50 | 47.30 | 9.50 |
\(\sigma = \sqrt{\frac{1}{N} \sum_{i=1}^{N} (x_i - \mu)^2}\)
| species | Mean_Bill_Length | Median_Bill_Length | Variance_Bill_Length | SD__Bill_Length |
|---|---|---|---|---|
| Adelie | 38.79 | 38.80 | 7.09 | 2.66 |
| Chinstrap | 48.83 | 49.55 | 11.15 | 3.34 |
| Gentoo | 47.50 | 47.30 | 9.50 | 3.08 |
\(IQR = Q3 - Q1\)
summary_stat <- penguins %>%
group_by(species) %>%
summarise(Mean_Bill_Length = mean(bill_length_mm, na.rm = TRUE),
Median_Bill_Length = median(bill_length_mm, na.rm = TRUE),
Variance_Bill_Length = var(bill_length_mm, na.rm = TRUE),
SD_Bill_Length = sd(bill_length_mm, na.rm = TRUE),
IQR_Bill_Length = IQR(bill_length_mm, na.rm = TRUE))
summary_stat %>%
kable(digits = 2)| species | Mean_Bill_Length | Median_Bill_Length | Variance_Bill_Length | SD_Bill_Length | IQR_Bill_Length |
|---|---|---|---|---|---|
| Adelie | 38.79 | 38.80 | 7.09 | 2.66 | 4.00 |
| Chinstrap | 48.83 | 49.55 | 11.15 | 3.34 | 4.73 |
| Gentoo | 47.50 | 47.30 | 9.50 | 3.08 | 4.25 |
\(\rho_{xy} = \frac{{\sum_{i=1}^{n} (x_i - \bar{x})(y_i - \bar{y})}}{{\sqrt{\sum_{i=1}^{n} (x_i - \bar{x})^2} \sqrt{\sum_{i=1}^{n} (y_i - \bar{y})^2}}}\)
The cor function computes the correlation coefficient, which measures the strength and direction of the linear relationship between two numeric variables. If your data contains missing values, you can handle them appropriately when calculating the correlation in R. The cor function has an argument called use, which allows you to specify how to handle missing values. For this example we only include rows with complete observations.
| bill_length_mm | bill_depth_mm | flipper_length_mm | body_mass_g | |
|---|---|---|---|---|
| bill_length_mm | 1.00 | -0.24 | 0.66 | 0.60 |
| bill_depth_mm | -0.24 | 1.00 | -0.58 | -0.47 |
| flipper_length_mm | 0.66 | -0.58 | 1.00 | 0.87 |
| body_mass_g | 0.60 | -0.47 | 0.87 | 1.00 |
In order to utilize ggplot2, it is necessary to acquire familiarity with multiple functions and arguments. Remembering all of these can be challenging, hence we strongly advise keeping the ggplot2 cheat sheet accessible. You can obtain a copy from this link.
With ggplot we have many ways to customize the look of our plot. Try to experiment with: fill color xlab ylab ggtitle labs xlim ylim scale_fill_manual scale_color_manual theme theme_bw.
Histograms are an easy way to see how your data is distributed. They provide a visual representation of the frequency or count of data points falling within specific intervals, known as bins.
penguins %>%
ggplot(aes(bill_length_mm)) +
geom_histogram(color = "black",
fill = "steelblue",
bins = 30) +
theme_bw()Alternatively you can use geom_density to get a quick overview over the distribution.
penguins %>%
ggplot(aes(bill_length_mm)) +
geom_density(color = "black",
fill = "steelblue",
bins = 30) +
theme_bw()Box plots, also known as box-and-whisker plots, are useful statistical tools for visualizing and summarizing the distribution of a data set. They provide a concise summary of several important characteristics of the data, including the center, spread, skewness, and potential outliers.
penguins %>%
ggplot(aes(x = species, y = bill_length_mm, fill = species)) +
geom_boxplot() +
theme_bw()A scatter plot is a great way to visualize relationships and differences between variables and groups within your data sets. In our data set we detect that bill length seems to have a linear relationship with body mass.
penguins %>%
ggplot(aes(x = body_mass_g, y = bill_length_mm, color = species)) +
geom_point() +
theme_bw()An easy way to plot a correlation matrix is to use ggcorrplot.
ggcorrplot(cor_matrix, hc.order = TRUE, type = "lower", lab = TRUE)\(z = \frac{{x - \mu}}{{\sigma}}\)
\(x\) represents the individual data point.
\(\mu\) is the population mean.
\(\sigma\) is the population standard deviation.
To apply a z-transformation in R we can use scale. This will set the mean to 0 and the standard deviation to 1. We can either replace our original value or create a new variable in our data set. With head we just look at the top of the tibble.
# A tibble: 6 × 4
# Groups: species [1]
species flipper_length_mm z_score_1[,1] z_score_2
<fct> <int> <dbl> <dbl>
1 Adelie 181 -1.37 -1.37
2 Adelie 186 -0.605 -0.605
3 Adelie 195 0.772 0.772
4 Adelie NA NA NA
5 Adelie 193 0.466 0.466
6 Adelie 190 0.00709 0.00709
\(P(X \leq x) = \Phi(x) = \int_{-\infty}^{x} \frac{1}{\sqrt{2\pi\sigma^2}} e^{-(t-\mu)^2/2\sigma^2} dt\)
By utilizing pnorm, we can determine the likelihood of values exceeding 0.2.
pnorm(0.2, mean = 1, sd = 2, lower.tail = FALSE)[1] 0.6554217
To calculate the probability that a bill length is equal to or less than 42.
[1] 0.4825487
We can do this for each species with group_by. Here we calculate the probability that a bill length is equal to or less than 42.
\(CI = \bar{x} \pm z \frac{s}{\sqrt{n}}\)
\(\bar{x}\) represents the sample mean
\(s\) represents the sample standard deviation
\(n\) represents the sample size
\(z\) represents the critical value from the standard normal distribution corresponding to the desired confidence level
A confidence interval is a range of values that is constructed from sample data and is used to estimate an unknown population parameter. It provides a measure of the uncertainty associated with the estimated parameter.
To calculate the confidence interval by hand we start by defining the confidence level and calculating n.
By using ( ) around and $ at end of our pipes we can access the vector which has removed all NA with drop_na.
Calculate sample mean and standard deviation.
Alternatively we can access our summary data frame if we are interested in only one species.
With the [ ] we can access the first element of the means which corresponds to the Adélie species.
bill_length_mean_adelie <- summary_stat$Mean_Bill_Length[1]
bill_length_sd_adelie <- summary_stat$SD_Bill_Length[1]The next step is to define the standard error.
standard_error <- bill_length_sd / sqrt(n)We then compute the t-score related to the confidence level.
First we set alpha as 1 - confidence level.
Then we define our degrees of freedom as n - 1.
alpha = 1 - confidence_level
degrees_of_freedom = n - 1
t_score = qt(p = alpha / 2, df = degrees_of_freedom, lower.tail= FALSE)Calculate the margin error.
margin_error <- t_score * standard_errorNow we can calculate and print the lower and upper bound confidence interval.
lower_bound <- bill_length_mean - margin_error
upper_bound <- bill_length_mean + margin_errorTo print text and stored variables we can use cat.
cat("The", confidence_level * 100, "% confidence interval is [", lower_bound, ",", upper_bound, "].")The 95 % confidence interval is [ 43.34125 , 44.50261 ].
If you have a single sample and want to test if its mean is significantly different from a hypothesized value, you can use t_test. The function calculates a t-statistic and provides a p-value.
Filter the data to only include Adélie penguins.
adelie <- penguins %>%
filter(species == "Adelie")Perform a one-sample t-test on the body mass of Adélie penguins
adelie %>%
t_test(body_mass_g ~ 1) %>%
kable()| .y. | group1 | group2 | n | statistic | df | p |
|---|---|---|---|---|---|---|
| body_mass_g | 1 | null model | 151 | 99.16672 | 150 | 0 |
If you have two independent samples and want to compare their means, you can use the t_test with two sample inputs. The function calculates a t-statistic and provides a p-value.
Filter the data to only include Adélie and Chinstrap penguins.
Perform a two-sample t-test on the body mass of Adélie and Chinstrap penguins.
adelie_chinstrap %>%
t_test(body_mass_g ~ species) %>%
kable()| .y. | group1 | group2 | n1 | n2 | statistic | df | p |
|---|---|---|---|---|---|---|---|
| body_mass_g | Adelie | Chinstrap | 151 | 68 | -0.5430902 | 152.4548 | 0.588 |
To inspect our result we can use ggplot. To plot the test significance on our plot we can use stat_compare_means.
adelie_chinstrap %>%
ggplot(aes(x = species, body_mass_g, fill = species)) +
geom_boxplot() +
theme_bw() +
stat_compare_means(aes(label = after_stat(p.signif)), method = "t.test", ref.group = "Chinstrap")If you have categorical data and want to test the independence of two variables, you can use the chisq_test. The function calculates a chi-square statistic and provides a p-value.
Perform a chi-square test on the species and island variables.
table <- table(penguins$species, penguins$island)
table %>%
chisq_test() %>%
kable()| n | statistic | p | df | method | p.signif |
|---|---|---|---|---|---|
| 344 | 299.5503 | 0 | 4 | Chi-square test | **** |
To see what normal data is supposed to look like, we start by randomly generate numbers with rnorm. We can specify what the mean and the standard deviation of the generated data should be.
set.seed(42)
normal_data <- as.data.frame(rnorm(1000)) %>%
rename(random_numbers = `rnorm(1000)`)
normal_data %>%
head(10) %>%
kable(align = "l")| random_numbers |
|---|
| 1.3709584 |
| -0.5646982 |
| 0.3631284 |
| 0.6328626 |
| 0.4042683 |
| -0.1061245 |
| 1.5115220 |
| -0.0946590 |
| 2.0184237 |
| -0.0627141 |
Now we create data that follows an exponential distribution with rexp.
exp_data <- as.data.frame(rexp(1000, rate=3)) %>%
rename(random_numbers = `rexp(1000, rate = 3)`)
exp_data %>%
head(10) %>%
kable(align = "l")| random_numbers |
|---|
| 0.1013135 |
| 0.2227730 |
| 0.1562810 |
| 0.0979355 |
| 0.2285855 |
| 0.5544748 |
| 0.3826574 |
| 0.2645057 |
| 0.6935407 |
| 0.6574124 |
Create histograms for both data sets.
normal_data %>%
ggplot(aes(random_numbers)) +
geom_histogram(color = "black",
fill = "steelblue",
bins = 25) +
theme_bw()exp_data %>%
ggplot(aes(random_numbers)) +
geom_histogram(color = "black",
fill = "steelblue",
bins = 25) +
theme_bw()To analyze visually analyze normality we can create Q-Q plot for both data sets.
Perform shapiro-wilk test using shapiro_test. We can not reject the null hypothesis because p-value is smaller 0.05.
normal_data %>%
shapiro_test(random_numbers) %>%
kable()| variable | statistic | p |
|---|---|---|
| random_numbers | 0.9988187 | 0.7669591 |
Perform shapiro-wilk test fo exponetial data using shapiro.test. We can reject the null hypothesis because p-value is lower than 0.05.
exp_data %>%
shapiro_test(random_numbers) %>%
kable()| variable | statistic | p |
|---|---|---|
| random_numbers | 0.8243563 | 0 |
For sample size larger than 50 we should use the kolmogorov-smirnov test using ks.test. Because the p-value is less than 0.05, we can reject the null hypothesis. The sample data does not come from a normal distribution.
ks.test(normal_data, 'pnorm')
Asymptotic one-sample Kolmogorov-Smirnov test
data: normal_data
D = 0.025285, p-value = 0.5448
alternative hypothesis: two-sided
ks.test(exp_data, 'pnorm')
Asymptotic one-sample Kolmogorov-Smirnov test
data: exp_data
D = 0.50003, p-value < 0.00000000000000022
alternative hypothesis: two-sided
If you have multiple groups and want to test if their means are significantly different, you can use the function aov followed by the summary to obtain an analysis of variance table with p-values.
Filter the data to only include Adélie, Chinstrap, and Gentoo penguins.
Perform an ANOVA test on the body mass of Adélie, Chinstrap, and Gentoo penguins.
adelie_chinstrap_gentoo %>%
anova_test(body_mass_g ~ species) %>%
kable()Warning: NA detected in rows: 4,272.
Removing this rows before the analysis.
| Effect | DFn | DFd | F | p | p<.05 | ges |
|---|---|---|---|---|---|---|
| species | 2 | 339 | 343.626 | 0 | * | 0.67 |
R package rstatix provides additional functions for Anova and repeated measures Anova
The Mann-Whitney U test is used to compare two independent non-Normally distributed groups. In R, you can use the wilcox.test function to perform this test. By default, the option paired = FALSE for independent data. See documentation for more details.
Filter the data to only include Adélie and Chinstrap penguins.
Perform a Mann-Whitney U test on the body mass of Adélie and Chinstrap penguins.
adelie_chinstrap %>%
wilcox_test(body_mass_g ~ species) %>%
kable()| .y. | group1 | group2 | n1 | n2 | statistic | p |
|---|---|---|---|---|---|---|
| body_mass_g | Adelie | Chinstrap | 151 | 68 | 4831 | 0.485 |
The Kruskal-Wallis test is used to compare the distribution of more than two independent non-Normal groups. In R, you can use the kruskal_test to perform this test.
penguins %>%
kruskal_test(body_mass_g ~ island) %>%
kable()| .y. | n | statistic | df | p | method |
|---|---|---|---|---|---|
| body_mass_g | 344 | 130.0696 | 2 | 0 | Kruskal-Wallis |
When conducting multiple comparisons, it is important to account for the increased likelihood of obtaining false positives. One of the most commonly use corrections is the Bonferroni correction.
Normal data: To perform an Anova for pairwise group comparisons, with Bonferroni correction, use the function:
treatment_outcome <- data.frame(
treatment = rep(c("A", "B", "C"), each = 20),
outcome = c(rnorm(20, mean = 2, sd =2),
rnorm(20, mean = 6, sd = 4),
rnorm(20, mean = 4, sd = 0.8)),
patient_id = rep(1:20, times = 3))
treatment_outcome %>%
pairwise_t_test(outcome ~ treatment, p.adjust.method = "bonferroni")# A tibble: 3 × 9
.y. group1 group2 n1 n2 p p.signif p.adj p.adj.signif
* <chr> <chr> <chr> <int> <int> <dbl> <chr> <dbl> <chr>
1 outcome A B 20 20 0.000000724 **** 0.00000217 ****
2 outcome A C 20 20 0.105 ns 0.314 ns
3 outcome B C 20 20 0.00024 *** 0.000721 ***
Non-Normal Data: To perform a Kruskal Wallis test with Bonferroni correction, use the Dunn Test with the option method. The Dunn Test performs the post hoc pairwise multiple comparisons procedure appropriate to follow up a Kruskal-Wallis test, which is a non-parametric analog of the one-way ANOVA.
| .y. | n | statistic | df | p | method |
|---|---|---|---|---|---|
| treatment | 60 | 47.09886 | 10 | 0.0000009 | Kruskal-Wallis |
treatment_outcome_pois %>%
dunn_test(treatment ~ outcome, p.adjust.method = "bonferroni") %>%
head(10) %>%
kable()| .y. | group1 | group2 | n1 | n2 | statistic | p | p.adj | p.adj.signif |
|---|---|---|---|---|---|---|---|---|
| treatment | 0 | 1 | 25 | 10 | 1.687856 | 0.0914389 | 1.0000000 | ns |
| treatment | 0 | 2 | 25 | 7 | 2.856371 | 0.0042851 | 0.2356825 | ns |
| treatment | 0 | 3 | 25 | 3 | 3.418819 | 0.0006289 | 0.0345915 | * |
| treatment | 0 | 4 | 25 | 2 | 2.842677 | 0.0044736 | 0.2460498 | ns |
| treatment | 0 | 5 | 25 | 4 | 3.879051 | 0.0001049 | 0.0057676 | ** |
| treatment | 0 | 6 | 25 | 3 | 3.418819 | 0.0006289 | 0.0345915 | * |
| treatment | 0 | 7 | 25 | 1 | 2.048367 | 0.0405240 | 1.0000000 | ns |
| treatment | 0 | 8 | 25 | 2 | 2.842677 | 0.0044736 | 0.2460498 | ns |
| treatment | 0 | 9 | 25 | 1 | 2.048367 | 0.0405240 | 1.0000000 | ns |
| treatment | 0 | 11 | 25 | 2 | 2.842677 | 0.0044736 | 0.2460498 | ns |
One easy way to calculate the ICC is to first calculate an Anova table that renders the Sum Of Squares (SS) for the explanatory (or grouping or pairing) variable considered. The SS are the partition of the variability between groups/panels and within groups/panels. Then we can use the simple formula below to obtain the ICC.
\(ICC = \frac{ss_{between}}{ss_{betwenn} - ss_{within}}\)
Load the data self esteem from R package datarium . The data set contains 10 individuals’ self-esteem score on three time points during a specific diet to determine whether their self-esteem improved.
# A tibble: 5 × 4
id t1 t2 t3
<int> <dbl> <dbl> <dbl>
1 1 4.01 5.18 7.11
2 2 2.56 6.91 6.31
3 3 3.24 4.44 9.78
4 4 3.42 4.71 8.35
5 5 2.87 3.91 6.46
selfesteem <- selfesteem %>%
gather(key = "time", value = "score", t1, t2, t3) %>%
convert_as_factor(id, time)
head(selfesteem, 5) %>%
kable()| id | time | score |
|---|---|---|
| 1 | t1 | 4.005027 |
| 2 | t1 | 2.558124 |
| 3 | t1 | 3.244241 |
| 4 | t1 | 3.419538 |
| 5 | t1 | 2.871243 |
For the ICC calculation, we are not interested in the differences over time. Note that we calculate the anova table just to obtain the SS’s for the groups. In this case, the grouping variable is the id of every person.
Df Sum Sq Mean Sq F value Pr(>F)
id 9 4.57 0.508 0.085 1
Residuals 20 119.08 5.954
#ICC <- ss-between / (ss-between + ss-within)
ICC <- 4.57/(4.57+119.08)
cat("ICC is:",ICC)ICC is: 0.03695916
In this case, the ICC is very small, so most of the variance is not at the person level but over time. There is a clear increasing trend over time.
Alternatively you can use anova_test.
selfesteem %>%
anova_test(score ~ id) %>%
kable()| Effect | DFn | DFd | F | p | p<.05 | ges |
|---|---|---|---|---|---|---|
| id | 9 | 20 | 0.085 | 1 | 0.037 |
R package PairedData provides functions to plot line-boxplots (profile plots), Bland-Altman plots and sliding chart plots
Bland-Altman plot to measure agreement between two methods:
Because paired data violate the assumption of data independence, we need to perform a non-parametric test, the Wilcoxon signed-rank test, with R function wilcox.test with the option paired = TRUE. This test only works for two groups.
Example data set: weight of the same 10 mice before and after a certain treatment.
First, we need to put the data into two numeric vectors
# Weight of the mice before treatment
before <-c(200.1, 190.9, 192.7, 213, 241.4, 196.9, 172.2, 185.5, 205.2, 193.7)
# Weight of the mice after treatment
after <-c(392.9, 393.2, 345.1, 393, 434, 427.9, 422, 383.9, 392.3, 352.2)
# Create a data frame
my_data <- data.frame(
group = rep(c("before", "after"), each = 10),
weight = c(before, after))res <- wilcox.test(before, after, paired = TRUE)
res
Wilcoxon signed rank exact test
data: before and after
V = 0, p-value = 0.001953
alternative hypothesis: true location shift is not equal to 0
Print only the p-value
res$p.value[1] 0.001953125
R package rstatix provides a series of functions to work with paired data or repeated measures, several groups and correct for multiple comparisons.
df <- ToothGrowth
df$dose <- as.factor(df$dose)Use function t_test to obtain p-values corrected for multiple comparisons, here we set paired = t
pairwise.test <- df %>% t_test(len ~ dose, paired = TRUE)
pairwise.test %>%
kable()| .y. | group1 | group2 | n1 | n2 | statistic | df | p | p.adj | p.adj.signif |
|---|---|---|---|---|---|---|---|---|---|
| len | 0.5 | 1 | 20 | 20 | -6.966881 | 19 | 0.0000012 | 0.0000025 | **** |
| len | 0.5 | 2 | 20 | 20 | -11.291494 | 19 | 0.0000000 | 0.0000000 | **** |
| len | 1 | 2 | 20 | 20 | -4.604647 | 19 | 0.0001930 | 0.0001930 | *** |
df %>%
ggboxplot(x = "dose", y = "len", fill = "steelblue") +
stat_pvalue_manual(
pairwise.test, label = "p.adj.signif",
y.position = c(29, 35, 39)) + theme_bw()